28 research outputs found

    Verb selection preferences: a computational approach

    Get PDF
    Il lavoro mira a fornire una rappresentazione delle preferenze di selezione verbali per la lingua italiana. L'esperimento si ricollega alle metodologie basate su corpora e si articola in due fasi: l'estrazione degli argomenti dai corpora scelti e la generalizzazione delle preferenze di selezione utilizzando un'ontologia lessicale. Le risorse utilizzate sono: LexIt, un lessico di valenza per i verbi italiani, come risorsa lessicale, e MultiWordNet, come ontologia. L'obiettivo è fornire un livello di rappresentazione dettagliato del comportamento verbale navigando l'intera rete semantica e facendo emergere comportamenti più specifici nelle preferenze di selezione degli argomenti verbali.The present article aims at providing a representation of verb selection preferences for the Italian language. The experiment connects methods based on corpora, and it has been carried on in two steps: first, topic mining from the selected corpora; then, the generalization of verb selection preferences using a lexical ontology.The resources used for the experiment are the following: LexIt, a lexicon for Italian verbs as a lexical resource, and MultiWordNet as an ontology.The aim of the study is provinding a detailed representation level of the verbal behaviour and selection through the semantic network, highlighting specific behaviours in preferences of verb selection

    Work Hard, Play Hard: Collecting Acceptability Annotations through a 3D Game

    Get PDF
    Corpus-based studies on acceptability judgements have always stimulated the interest of researchers, both in theoretical and computational fields. Some approaches focused on spontaneous judgements collected through different types of tasks, others on data annotated through crowd-sourcing platforms, still others relied on expert annotated data available from the literature. The release of CoLA corpus, a large-scale corpus of sentences extracted from linguistic handbooks as examples of acceptable/non acceptable phenomena in English, has revived interest in the reliability of judgements of linguistic experts vs. non-experts. Several issues are still open. In this work, we contribute to this debate by presenting a 3D video game that was used to collect acceptability judgments on Italian sentences. We analyse the resulting annotations in terms of agreement among players and by comparing them with experts{'} acceptability judgments. We also discuss different game settings to assess their impact on participants{'} motivation and engagement. The final dataset containing 1,062 sentences, which were selected based on majority voting, is released for future research and comparisons

    Multi-Word Expressions in spoken language: PoliSdict

    Get PDF
    The term multiword expressions (MWEs) is referred-to a group of words with a unitary meaning, not inferred from that of the words that compose it, both in current use and in technical-specialized languages. In this paper, we describe PoliSdict an Italian electronic dictionary composed of multi-word expressions (MWEs) automatically extracted from a multimodal corpus grounded on political speech language, currently being developed at the "Maurice Gross" Laboratory of the Department of Political Sciences, Social and Communication of the University of Salerno, thanks to a loan from the company Network Contacts. We introduce the methodology of creation and the first results of a systematic analysis which considered terminological labels, frequency labels, recurring syntactic patterns, further proposing an associated ontology.Con il termine polirematica si fa generalmente riferimento ad un gruppo di parole con significato unitario, non desumibile da quello delle parole che lo compongono, sia nell’uso corrente sia in linguaggi tecnico-specialistici. In questo contributo viene presentato PoliSdict un dizionario elettronico in lingua italiana composto da espressioni polirematiche occorrenti nel parlato spontaneo estratte a partire da un corpus multimodale di dominio politico in lingua italiana in corso di ampliamento presso il Laboratorio “Maurice Gross” del Dipartimento di Scienze Politiche, Sociali e della Comunicazione dell’Università degli Studi di Salerno, grazie a un finanziamento della società Network Contacts. Viene presentata la metodologia di creazione ed i primi risultati di un'analisi sistematica che ha considerato etichette terminologiche, marche d'uso e pattern ricorrenti, proponendo infine un’ontologia associata

    ELECTRA for Neural Coreference Resolution in Italian

    Get PDF
    In recent years, the impact of Neural Language Models has changed every field of Natural Language Processing. In this scenario, coreference resolution has been among the least considered task, especially in language other than English. This work proposes a coreference resolution system for Italian, based on a neural end-to-end architecture integrating ELECTRA language model and trained on OntoCorefIT, a novel Italian dataset built starting from OntoNotes. Even if some approaches for Italian have been proposed in the last decade, to the best of our knowledge, this is the first neural coreference resolver aimed specifically to Italian. The performance of the system is evaluated with respect to three different metrics and also assessed by replacing ELECTRA with the widely-used BERT language model, since its usage has proven to be effective in the coreference resolution task in English. A qualitative analysis has also been conducted, showing how different grammatical categories affect performance in an inflectional and morphological-rich language like Italian. The overall results have shown the effectiveness of the proposed solution, providing a baseline for future developments of this line of research in Italian

    Proceedings of the Fifth Italian Conference on Computational Linguistics CLiC-it 2018

    Get PDF
    On behalf of the Program Committee, a very warm welcome to the Fifth Italian Conference on Computational Linguistics (CLiC-­‐it 2018). This edition of the conference is held in Torino. The conference is locally organised by the University of Torino and hosted into its prestigious main lecture hall “Cavallerizza Reale”. The CLiC-­‐it conference series is an initiative of the Italian Association for Computational Linguistics (AILC) which, after five years of activity, has clearly established itself as the premier national forum for research and development in the fields of Computational Linguistics and Natural Language Processing, where leading researchers and practitioners from academia and industry meet to share their research results, experiences, and challenges

    Le preferenze di selezione verbali: un approccio computazionale Verb selection preferences: a computational approach

    No full text
    Il lavoro mira a fornire una rappresentazione delle preferenze di selezione verbali per la lingua italiana. L'esperimento si ricollega alle metodologie basate su corpora e si articola in due fasi: l'estrazione degli argomenti dai corpora scelti e la generalizzazione delle preferenze di selezione utilizzando un'ontologia lessicale. Le risorse utilizzate sono: LexIt, un lessico di valenza per i verbi italiani, come risorsa lessicale, e MultiWordNet, come ontologia. L'obiettivo è fornire un livello di rappresentazione dettagliato del comportamento verbale navigando l'intera rete semantica e facendo emergere comportamenti più specifici nelle preferenze di selezione degli argomenti verbali.The present article aims at providing a representation of verb selection preferences for the Italian language. The experiment connects methods based on corpora, and it has been carried on in two steps: first, topic mining from the selected corpora; then, the generalization of verb selection preferences using a lexical ontology.The resources used for the experiment are the following: LexIt, a lexicon for Italian verbs as a lexical resource, and MultiWordNet as an ontology.The aim of the study is provinding a detailed representation level of the verbal behaviour and selection through the semantic network, highlighting specific behaviours in preferences of verb selection.</p

    Verb selection preferences: a computational approachv

    No full text
    Il lavoro mira a fornire una rappresentazione delle preferenze di selezione verbali per la lingua italiana. L�esperimento si ricollega alle metodologie basate su corpora e si articola in due fasi: l�estrazione degli argomenti dai corpora scelti e la generalizzazione delle preferenze di selezione utilizzando un�ontologia lessicale. Le risorse utilizzate sono: LexIt, un lessico di valenza per i verbi italiani, come risorsa lessicale, e MultiWordNet, come ontologia. L�obiettivo è fornire un livello di rappresentazione dettagliato del comportamento verbale navigando l�intera rete semantica e facendo emergere comportamenti più specifici nelle preferenze di selezione degli argomenti verbali

    Modeling entity types in faceted lightweight ontologies

    No full text
    The work analyses various approaches to knowledge representation, in particular approaches that manage diversity in knowledge. It is shown an approach based on faceted lightweight ontologies and some particular cases of study on modelling entity types about events, mind products and information objects

    Developing an annotator for Latin texts using Wikipedia

    No full text
    International audienceThis work investigates the feasibility of using Wikipedia as a resource for annotations of Latin texts. Although Wikipedia is an excellent resource from which to extract many kinds of information (morphological, syntactic and semantic) to be used in NLP tasks on modern languages, it was rarely applied to perform NLP tasks for the Latin language. The work presents the first steps of the development of a POS Tagger based on the Latin version of Wiktionary and a Wikipedia-based semantic annotator

    Le preferenze di selezione verbali: un approccio computazionale

    No full text
    Il lavoro mira a fornire una rappresentazione delle preferenze di selezione verbali per la lingua italiana. L'esperimento si ricollega alle metodologie basate su corpora e si articola in due fasi: l'estrazione degli argomenti dai corpora scelti e la generalizzazione delle preferenze di selezione utilizzando un'ontologia lessicale. Le risorse utilizzate sono: LexIt, un lessico di valenza per i verbi italiani, come risorsa lessicale, e MultiWordNet, come ontologia. L'obiettivo è fornire un livello di rappresentazione dettagliato del comportamento verbale navigando l'intera rete semantica e facendo emergere comportamenti più specifici nelle preferenze di selezione degli argomenti verbali